Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning
نویسندگان
چکیده
Many factory optimization problems, from inventory control to scheduling and reliability , can be formulated as continuous-time Markov decision processes. A primary goal in such problems is to nd a gain-optimal policy that minimizes the long-run average cost. This paper describes a new average-reward algorithm called SMART for nd-ing gain-optimal policies in continuous time semi-Markov decision processes. The paper presents a detailed experimental study of SMART on a large unreliable production inventory problem. SMART outperforms two well-known reliability heuristics from industrial engineering. A key feature of this study is the integration of the reinforcement learning algorithm directly into two commercial discrete-event simulation packages, ARENA and CSIM, paving the way for this approach to be applied to many other factory optimization problems for which there already exist simulation models.
منابع مشابه
Continuous-Time Hierarchical Reinforcement Learning
Hierarchical reinforcement learning (RL) is a general framework which studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work in hierarchical RL, such as the MAXQ method, has been limited to the discrete-time discounted reward semiMarkov decision process (SMDP) model. This paper generalizes the MAXQ method to continuous-time discounte...
متن کاملExtending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models
Hierarchical reinforcement learning (HRL) is a general framework that studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work on HRL has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. In this paper we generalize the setting of HRL to averagereward, continuous-time and multi-agent SMDP mo...
متن کاملModel-Based Average Reward Reinforcement Learning
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Average-reward Reinforcement Learning meth...
متن کاملHierarchical Average Reward Reinforcement Learning
Hierarchical reinforcement learning (HRL) is the study of mechanisms for exploiting the structure of tasks in order to learn more quickly. By decomposing tasks into subtasks, fully or partially specified subtask solutions can be reused in solving tasks at higher levels of abstraction. The theory of semi-Markov decision processes provides a theoretical basis for HRL. Several variant representati...
متن کاملReinforcement Learning for Motor Primitives
Humans demonstrate a large variety of very complicated motor skills in their day-to-day life. Their agility and adaptability to new control task remains unmatched by the millions of robots laboring on factory floors and roaming research labs. Achieving the abilities of learning and improving new motor skills has become an essential component in order to get a step closer to human-like motor ski...
متن کامل